Implement LMDB-based multi-modal cache by petersalas · Pull Request #30373 · vllm-project/vllm

petersalas · 2025-12-10T07:21:31Z

Purpose

This implements an LMDB-based multi-modal item cache which supports LRU-based eviction, multiple API server workers, and/or multiple Engine processes.

BaseMultiModalProcessorCache/BaseMultiModalReceiverCache now include a begin() method which (for the LMDB implementation) uses a transaction. In the sender-side cache, writes are queued outside of the transaction scope so that processing/serialization all occur outside of the scope of the write transaction (LMDB has single writer semantics).
Objects are split into to ~4KB chunks to avoid pathological LMDB free list fragmentation (at the expense of cache locality and copy overhead).
A single engine process effectively owns the cache: it starts an evictor process and handles resets. The evictor starts operating at 50% utilization and ramps up its aggressiveness (as measured by % of time holding the write lock) as the utilization approaches 100%.

One caveat with this implementation is that any item that hasn't been used within a fixed time window (--mm-lmdb-cache-min-eviction-age, defaults to 600 seconds) may be evicted, even if there are queued requests still depending on those items (the worker's execute_model will raise in that case). A future improvement could be to instead track the oldest active request in each of the frontends (OutputProcessor seemingly already has this) and use that instead.

Test Plan

Tested across permutations of API/Tensor/Data parallelism.

(Happy to run any suggested benchmarks as well!)

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a new LMDB-based multi-modal cache, a significant feature enabling caching across multiple API server workers or engine processes on the same machine. The implementation is well-designed, featuring a dedicated evictor process with an adaptive strategy, object chunking to prevent fragmentation, and transaction management to ensure data consistency while minimizing lock contention. The changes are well-integrated into the existing caching framework, and the new functionality is accompanied by a solid set of tests. I've identified one minor issue regarding unreachable code, which is detailed in a specific comment. Overall, this is a high-quality and well-executed contribution.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

mergify · 2025-12-10T07:51:45Z

Hi @petersalas, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

DarkLight1337

Some very initial comments

mergify · 2025-12-11T06:35:26Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-12-15T17:51:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-12-17T10:06:37Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2026-01-15T07:25:30Z

Heads-up that I have implemented the refactor to the cache factories in #32382, and have added you as co-author.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

mergify · 2026-01-24T12:51:47Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-03-11T21:35:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Peter Salas <peter@fixie.ai>

petersalas · 2026-04-03T18:25:26Z

As one might reasonably expect, is marginally slower than the shm cache in a microbenchmark, but in exchange supports api-server scale-out which can significantly improve tail latency in multi-modal heavy inference scenarios.

> vllm bench mm-processor --model Qwen/Qwen2.5-VL-3B-Instruct --dataset-name random-mm --num-prompts 200 --tensor-parallel-size 1 -
-mm-processor-cache-type shm --mm-processor-cache-gb 2 --max-model-len 8192

================================================================================
Multimodal Processor Benchmark Results
================================================================================

MM Processor Metrics:
                     Stage  Mean Median   Std P99.0
          get_mm_hashes_ms  0.15   0.09  0.07  0.25
get_cache_missing_items_ms  0.01   0.01  0.00  0.02
     apply_hf_processor_ms 20.88   3.50 19.03 47.35
        merge_mm_kwargs_ms  5.16   1.05  4.62 10.70
   apply_prompt_updates_ms  3.04   1.04  2.22  5.77
     preprocessor_total_ms 29.24   5.62 25.93 63.14
        encoder_forward_ms 34.67  35.86 16.56 70.56
         num_encoder_calls  1.00   1.00  0.00  1.00

Summary: 200 total encoder calls across 200 requests.

End-to-End Latency (ms):
Metric Value (ms)
  Mean   18606.72
Median   17734.48
   Std    4505.58
 P99.0   29678.76

 > vllm bench mm-processor --model Qwen/Qwen2.5-VL-3B-Instruct --dataset-name random-mm --num-prompts 200 --tensor-parallel-size 1 -
-mm-processor-cache-type lmdb --mm-processor-cache-gb 2 --max-model-len 8192

================================================================================
Multimodal Processor Benchmark Results
================================================================================

MM Processor Metrics:
                     Stage  Mean Median   Std P99.0
          get_mm_hashes_ms  0.16   0.11  0.08  0.32
get_cache_missing_items_ms  0.02   0.01  0.00  0.03
     apply_hf_processor_ms 22.09   4.18 20.42 61.18
        merge_mm_kwargs_ms  1.80   0.36  1.65  4.72
   apply_prompt_updates_ms  3.18   1.30  2.31  6.89
     preprocessor_total_ms 27.24   5.66 24.39 70.14
        encoder_forward_ms 34.70  29.71 15.50 70.51
         num_encoder_calls  1.00   1.00  0.00  1.00

Summary: 200 total encoder calls across 200 requests.

End-to-End Latency (ms):
Metric Value (ms)
  Mean   18451.56
Median   16108.19
   Std    5463.13
 P99.0   31827.07

DarkLight1337

Asked @ywang96 to take a look since I did a pass before already

petersalas requested review from DarkLight1337, NickLucche, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tjtanaa, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners December 10, 2025 07:21

mergify Bot added ci/build multi-modality Related to multi-modality (#4194) v1 labels Dec 10, 2025

gemini-code-assist Bot reviewed Dec 10, 2025

View reviewed changes

chatgpt-codex-connector Bot reviewed Dec 10, 2025

View reviewed changes

Comment thread vllm/multimodal/lmdb_cache.py

ywang96 self-assigned this Dec 10, 2025

DarkLight1337 added this to Multi-modality Core Dec 10, 2025

DarkLight1337 moved this to In Progress in Multi-modality Core Dec 10, 2025

DarkLight1337 reviewed Dec 10, 2025

View reviewed changes

petersalas commented Dec 11, 2025

View reviewed changes

Comment thread vllm/multimodal/lmdb_cache.py Outdated

DarkLight1337 reviewed Dec 11, 2025

View reviewed changes

Comment thread vllm/multimodal/processing.py Outdated

DarkLight1337 reviewed Dec 11, 2025

View reviewed changes

Comment thread vllm/envs.py Outdated

DarkLight1337 reviewed Dec 11, 2025

View reviewed changes

Comment thread vllm/multimodal/lmdb_cache.py Outdated

mergify Bot added the needs-rebase label Dec 11, 2025

petersalas force-pushed the psalas/lmdb-mm-cache branch from ed905a8 to 187de7b Compare December 11, 2025 06:37

mergify Bot removed the needs-rebase label Dec 11, 2025

mergify Bot added the needs-rebase label Dec 15, 2025

petersalas force-pushed the psalas/lmdb-mm-cache branch from 75fb09d to b28d0be Compare December 15, 2025 20:10

mergify Bot removed the needs-rebase label Dec 15, 2025

mergify Bot added the needs-rebase label Dec 17, 2025

DarkLight1337 mentioned this pull request Jan 15, 2026

[2/N] Move cache factories to MM registry #32382

Merged

5 tasks

mergify Bot removed the needs-rebase label Jan 23, 2026

cursor Bot reviewed Jan 23, 2026

View reviewed changes

Comment thread vllm/envs.py Outdated

petersalas requested a review from DarkLight1337 January 23, 2026 23:35

mergify Bot added the needs-rebase label Jan 24, 2026

reaganjlee reviewed Jan 27, 2026

View reviewed changes

Comment thread vllm/multimodal/lmdb_cache.py Outdated

reaganjlee reviewed Jan 27, 2026

View reviewed changes

Comment thread tests/multimodal/test_cache.py Outdated

petersalas force-pushed the psalas/lmdb-mm-cache branch from 6661fa8 to 3a190ad Compare January 31, 2026 05:05

mergify Bot removed the needs-rebase label Jan 31, 2026

DarkLight1337 reviewed Jan 31, 2026

View reviewed changes

Comment thread vllm/multimodal/cache.py Outdated

mergify Bot added the needs-rebase label Mar 11, 2026

petersalas force-pushed the psalas/lmdb-mm-cache branch from 3a190ad to 5943789 Compare April 3, 2026 17:15

petersalas requested a review from njhill as a code owner April 3, 2026 17:15

Implement LMDB-based multi-modal cache

b4d18a7

Signed-off-by: Peter Salas <peter@fixie.ai>

petersalas force-pushed the psalas/lmdb-mm-cache branch from 5943789 to b4d18a7 Compare April 3, 2026 18:16

mergify Bot removed the needs-rebase label Apr 3, 2026

petersalas requested a review from DarkLight1337 April 3, 2026 21:56

DarkLight1337 reviewed Apr 4, 2026

View reviewed changes

Uh oh!

Conversation

petersalas commented Dec 10, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mergify Bot commented Dec 10, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Dec 11, 2025

Uh oh!

mergify Bot commented Dec 15, 2025

Uh oh!

mergify Bot commented Dec 17, 2025

Uh oh!

DarkLight1337 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify Bot commented Jan 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Mar 11, 2026

Uh oh!

petersalas commented Apr 3, 2026

Uh oh!

DarkLight1337 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

petersalas commented Dec 10, 2025 •

edited by github-actions Bot

Loading

DarkLight1337 commented Jan 15, 2026 •

edited

Loading

DarkLight1337 left a comment •

edited

Loading